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Abstract 

We say that a g-ary length n code is non- overlapping if the set of 
non-trivial prefixes of codewords and the set of non-trivial suffices of 
codewords are disjoint. (This property is often called cross-bifix-free.) 

Bajic and Stojanovic were the first to consider non-overlapping 
codes, motivated by applications requiring fast and reliable frame syn- 
chronisation. Chee, Kiah, Purkayastha and Wang showed that a ^-ary 
length n non-overlapping code contains at most /{2n—l) codewords, 
and gave a construction of a family of non-overlapping codes that are 
within a constant factor of this upper bound when q is fixed. 

We provide a simple combinatorial proof of the upper bound of 
Chee et al. (in fact, the upper bound is improved slightly). The con- 
struction of Chee et al. is good for small alphabet sizes, but per- 
forms much less well when the alphabet size q is large compared to 
the length n of the code. We provide a construction that generalises 
the construction of Chee et al. which performs well for all parameter 
sizes. More precisely, we show that our construction provides non- 
overlapping codes whose cardinality is within an (absolute) constant 
factor of the upper bound, for all parameters n and q. 

We also consider codes of short length, showing that the upper 
bound of g"/ (2n — 1) is not always asymptotically tight. We provide 
a conjecture on the leading term of the cardinality of an optimal non- 
overlapping code when n is fixed and q ^ oo. 
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1 Introduction 



Let u and v be two words (not necessarily distinct) of length n, over a finite 
alphabet F of cardinality q. We say that u and v are overlapping if a non- 
empty proper prefix of u is equal to a non-empty proper suffix of f , or if 
a non-empty proper prefix of v is equal to a non-empty proper suffix of u. 
So, for example, the binary words 00000 and 01111 are overlapping; so are 
the words 10001 and 11110. However, the words 11111 and OHIO are non- 
overlapping. 

We say that a code C C F"' is non-overlapping if for all (not necessarily 
distinct) u,v & C, the words u and v are non-overlapping. The following is an 
example of a non-overlapping binary code of length 6 containing 3 codewords: 

C = {001101,001011,001111}. 

We write C{n,q) for the maximum number of codewords in a g-ary non- 
overlapping code of length n. It is easy to see that C{l,q) = q. From now 
on, to avoid trivialities, we always assume that n > 2. 

Inspired by the use of distributed sequences in frame synchronisation 
applications by van Wijngaarden and Willink [7], Bajic and Stojanovic |2] 
were the first to study non-overlapping codes. (Bajic and Stojanovic used 
the term cross-bifix-free sets rather than non-overlapping codes.) See also [H 
El m El El [7] for various aspects of non-overlapping (cross-bifix-free) codes 
and their applications to synchronisation. 

Chee, Kiah, Purkayastha and Wang [0] showed that 

and provided a construction of a class of non-overlapping codes whose cardi- 
nality is within a constant factor of the bound ([1]) when the alphabet size q 
is fixed and the length n tends to infinity. 

Chee et al. established the bound ([I]) by appealing to the application 
in synchronisation (deriving the bound from the fact that a certain variance 
must be positive). In Section [2]below, we provide a direct combinatorial proof 
of this bound. (Indeed, the combinatorial derivation allows us to improve the 
bound slightly to a strict inequality.) 

The construction of Chee et al. becomes poor for large alphabet sizes (in 
the sense that the ratio between the number of codewords in the construction 
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and the upper bound tends to as g increases). In Section [21 we provide 
a simple generalisation of their construction which performs well when q 
is large. Indeed, in Section [5] we show that this generalised construction 
produces non-overlapping codes whose cardinality is within a constant factor 
of the upper bound ([1]) even when the alphabet size q is allowed to grow. 

In Section m we provide exact values for C(2, q) and C(3, g); these values 
show that the bound ([T]) is not always asymptotically tight. We also state 
a conjecture on the asymptotics of C{n,q) when n is fixed and q tends to 
infinity. 



2 An upper bound 

We provide a direct combinatorial proof of the following theorem, that slightly 
strengthens the bound due to Chee et al. [6]. 

Theorem 1. Let n and q he integers with n >2 and q > 2. Let C{n, q) he the 
numher of codewords in the largest non- overlapping q-ary code of length n. 
Then 



C{n,q) < 



2n-l' 



Proof. Let C be a non-overlapping code of length n over an alphabet F 
with \F\ = q. Consider the set X of pairs {w,i) where w G _p2?i-i^ ^ ^ 
{l,2,...,2n — 1} and the (cyclic) subword of w starting at position i lies in C. 
So, for example, if C is the code in the introduction then (01111110011, 8) G 
X. 

We see that \X\ = (2n — l)|C|g"^^, since there are 2n — 1 choices for i, 
then \C\ choices for the codeword starting in the ith position of w, then g"~^ 
choices for the remaining positions in w. 

Since C is non-overlapping, two codewords cannot appear as distinct 
cyclic subwords of any word w of length 2n — 1. Thus, for any w G F^n-i 
there is at most one choice for an integer i such that {w,i) G X. Moreover, 
no subword of any of the q constant words w of length 2n — 1 can appear as 
a codeword in a non-overlapping code. So |X| < g^n-i — g < g^n-i^ 

The theorem now follows from the inequality 

(2n- l)|C|g"^i < |X| < □ 
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3 Constructions of non-overlapping codes 



Let F = {0, 1, . . . , g — 1}. Chee et al. provide the following construction of a 
non-overlapping code of length n over F. 

Construction 1 (Chee et al. Let k be an integer such that 1 < k < n—1. 
Let C he the set of all words c G F"- such that: 

• Cj = for 1 < i < k (so all codewords start with k zeroes); 

• Cfc+i 7^ 0, and Cn ^ 0; 

• the sequence Ck+2, Ck+3, • • • , c„_i does not contain k consecutive zeroes. 

Then C is a non- overlapping code. 

It is not hard to see that the construction above is indeed a non-overlapping 
code. Chee et al. show that the construction is already good for small pa- 
rameters. Indeed, they show that for binary codes, Construction [1] (with the 
best choice of k) achieves the best possible code size whenever n < 14 and 

It less clear how to choose k in general so that C is as large as possible, 
and what the resulting asymptotic size of the code is. Much of the paper 
of Chee et al. sets out to answer these questions. Indeed, the authors argue 
that when q is fixed, and k is chosen appropriately (as a function of n), we 
have that 

liminf |C|/(g7r2) > 
n-5-oo qe 

where e is the base of the natural logarithm. This shows that Theorem [1] 
is tight to within a constant factor when q is fixed. Their result uses a 
delicate argument using techniques from algebraic combinatorics. In fact, 
the following much simpler argument gives a similar, though weaker, result. 

Lemma 2. Let q be a fixed integer, q > 2. Then the codes in ConstructionUl 
show that 

]imMC{n,q)/iq-/n) > ^'^ ~ ^^^."^ ~ ^\ 

n— >oo 4(7 

Proof. We begin by claiming that when 2k < n — 2 the number of g-ary 
sequences of length n — k — 2 containing no k consecutive zeros is at least 
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To see this, note that any sequence that fails the condition of containing no 
k consecutive sequences of zeroes must contain k consecutive zeros starting 
at some position z, where 1 < i < n — k — 2 — (k — 1). Since there are 
n — 2k — l possibihties for i, and g"-2fc-2 sequences containing k zeros starting 
at position i, our claim follows. Thus, if C is the non-overlapping code in 
Construction [H 



\C\ >{q- - = (^^^j - ^9"'*^' 

The function — nq'"^^ is maximised when k = logg(2n) + S, where S is 
chosen so that < 1 and k is an integer. In this case, the value of q~^ —nq~'^^ 
is bounded below by (2g — l)/(4?T,g^) (this can be shown by always taking 5 
to be non- negative) . Thus 

Id > ffa-f^r^') f- 

When the alphabet size q is much larger than the length n. Construction [1] 
produces codes that are much smaller than the upper bound in Theorem [TJ 
The following generalisation of Construction [T] does not have this drawback; 
we discuss this issue further in Sections S] and O below. 

Let 5* C . We say that a word xiX2 ■ ■ ■ Xr & is S-free if r < fc, or if 
r > k and XiXi+i ■ ■ ■ Xi+k-i ^ S* for alH G {1, 2, . . . , r — A; + 1}. 

Construction 2. Let k and i be such that 1 < k < n — 1 and 1 < i < q — I. 

Let F = I U J be a partition of a set F of cardinality q into two parts I and 
J of cardinalities i and q — i respectively. Let S l'^ ^ F^ . Let C be the set 
of all words c G -F" such that: 

• C1C2 ■■■Ck e S; 

• Cfc+i G J, and c„ G J; 

• the word Ck+2, Cfc+s, • • • , c„-i is S-free. 

Then C is a non- overlapping code. 

It is easy to see that Construction [1] is the special case of Construction [2] 
with £ = 1, / = {0} and S = {0*^}. 
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4 Non-overlapping codes of small length 



This section considers non-overlapping codes of fixed length n, when the 
alphabet size q becomes large. In this situation, Construction [1] produces 
codes that are much smaller than the upper bound in Theorem [H To see this, 
note that there are at most codewords in a code C from Construction [H 
since the first k components of any codeword are fixed. So, since k is positive, 
\C\ < q^~^ and therefore \C\/{q"/n) < n/q. 

The proof of the following theorem shows that the codes given by Con- 
struction |2] are within a constant factor of the bound in Theorem [1] whenever 
n is fixed and g — t- oo. 

Theorem 3. Let n he a fixed positive integer, n >2. Then 

n — 1 



\im mi C{n,q)/{q'^/n) > 

Proof. We use Construction [2] in the special case when k = n — l and S = I^. 
In this case (in the notation of Construction |2]) C is the set of words whose 
first n—l components lie in J, and whose final component lies in J. So here 

1^1 =(i^-\q-(i). 

Let a = \{{n — l)/n)q] . Since q — i > {l/n)q — 1, we find that 

n \ n J 

and so the theorem follows. □ 

The following theorem shows (in particular) that the bound (|C| < 
of Theorem [T] is not asymptitically tight when n = 2 and g — )■ oo. 

Theorem 4. A largest q-ary length 2 non- overlapping code has C{2, q) code- 
words, where C{2,q) = lq/2\ \q/2']. In particular, 

hm C(2,g)/(gV2) = i. 

g— i-oo / 

Proof. Construction [2] in the case n = 2, k = 1, £ = \_q/2\ and S = 
provides the lower bound on C{2, q) we require. 

Let C be a g-ary non-overlapping code of length q. Let / be the set of 
symbols which occur in the first position of a codeword in C, and let J be 
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the set of symbols that occur in the final position of a codeword in C. Since 
C is non-overlapping, I and J are disjoint. Thus 

\C\<\I\\J\<\I\{q-\I\)<[q/2\\q/2l 

□ 

In the following theorem, [x] denotes the nearest integer to the real num- 
ber X. 

Theorem 5. A largest q-ary length 3 non- overlapping code has C(3, q) code- 
words, where C{3,q) = [2g/3]^(g — [2q/3]). In particular, 

hm C(3,g)/(gV3) = ^. 

q—>-oo y 

Proof. Construction |2] in the case n = 3, k = 2, i = [2q/3] and 3 = 1'^ 
provides the lower bound on C{2, q) we require. 

Let C be a g-ary non-overlapping code of length q of maximal size. Let 
F be the underlying alphabet of C, so |F| = g. 

Let / be the set of symbols which occur in the first position of a codeword 
in C. Let J be the complement of J in F, so |J| = q — |/|. Since C 
is non-overlapping, the symbols that occur in the final component of any 
codeword lie in J. So we may write C as a disjoint union C = C1UC2, where 
Ci C / X / X J and C2 C / X J X J. 

Let X be the set of all pairs (6, c) E I x J such that aba G C for some 
a E I. Define 

Ci = {aba \ a E I and (6, c) G X}, 

= {bed I (6, c) G (/ X J) \ X and G J}. 

Clearly Ci C Ci. Moreover, C2 C C2, since whenever bed G C is a codeword, 
the fact that C is non-overlapping implies that (6, c) ^ X. But C = CiU C2 
is a non-overlapping code, and so C = C as C is maximal. 
We have 

\C\ = \C\ = \X\\I\ + {\I\\J\ - \X\)\J\ = \X\i\I\ -\J\) + \I\\J\\ 

If l-^l ^ I'-'^l; then the maximum value of \C\ is achieved when |X| = 0, at 
maxjgji 2,...,[g/2j} ^^(q' ~ i)- If l-^l > the maximum value of \C\ is achieved 
when |X| = |/||J|, at maXig|Lg/2j,Lg/2j+i,...,g-i} ^^(? - 0- Thus 

|C|< max t'{q-{) = [2q/3]\q-[2q/3]), 

je{l,2,...,g-l} 
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and so the theorem follows. 



□ 



It would be interesting to determine the asymptotic behaviour of C(n, q) 
when g — )■ oo for a general fixed length n. I believe the following two conjec- 
tures are true. 

Conjecture 1. Let n be an integer such that n >2. Then 



The following conjecture implies Conjecture [T] 

Conjecture 2. Let n he an integer such that n > 2. For all sufficiently 
large integers q, a largest q-ary non- overlapping code of length n is given by 
ConstructionlB in the case k = n — 1 (and some value of i). 

5 Good constructions for general parameters 

This section shows that Construction [2] is always good, in the sense that it 
produces non-overlapping codes of cardinality within a constant factor of the 
upper bound given by Theorem [1] for all parameters. This is implied by the 
proof of the following theorem. 

Theorem 6. There exist absolute constants ci and C2 such that 



for all integers n and q with n > 2 and q>2. 

Proof. The existance of C2 follows by the upper bound on C (n, q) given by 
TheoremlH We prove the lower bound by showing that there exists a constant 
Ci such that for all choices of n and g, one of the constructions given by 
Construction |2] contains at least Ci(g"/?2) codewords. 

Let (ni, gi), (^2, ^2), • • • be an infinite sequence of pairs of integers where 
Hi > 2 and qi > 2. It suffices to show that C{ni,qi)/{q^*/ni) is always 
bounded below by some positive constant as i — )■ 00. Suppose, for a con- 
tradiction, that this is not the case. By passing to a suitable subsequence 
if necessary, we may assume that C{ni,qi)/{q^'/ni) — )■ as i — 00. If the 




Ci(gV^)<C(n,g)<C2(g7^) 
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integers qi are bounded, then Lemma [2] gives a contradiction. If the inte- 
gers Hi are bounded, we again have a contradiction, by Theorem [31 So we 
may assume, without loss of generahty, that the integer sequences (rij) and 
(gi) are unbounded. By passing to a suitable subsequence if necessary, we 
may therefore assume that (rij) and (g,) are strictly increasing sequences (and 
that rti and qi are sufficiently large for our purposes below). In particular, 
we may assume that — )■ oo and — )■ oo as z — ?■ oo. 

Let ki = [log2 2nj], and set Si = [gf'/(2nj)J. Let Fi be a set of size gj. 
Let li C Fi have cardinality it, where ii = [s^'^'^']. Let Jj be the complement 
of Ii in Fi. Let Si be a subset of jf* of cardinality Sj. Note that such a set 
Si exists, by our choice of ii. 

Let Ci be the g^-ary non-overlapping code of length rii given by Construc- 
tion |2] in the case k = ki, £ = ii, I = Ii, J = Ji and S = Si. Then 

\Q\ = \S\{q,-i,yf, (2) 

where /j is the number of S'-free sequences of length rii — ki — 2. We now aim 
to find a lower bound on |Cj|. 

Since g^ — )• oo as i — )■ oo, we see that 



Hence 



oo. 



\S\ ~ gfV(2n,) (3) 



as i — 7- oo. 
Note that 

^2/7,. > 2'°S2(2'^«)/21og2(2n,) _ 2I/2 



and hence 



< I 1^ 1 < 2-V^g.. 



Since (1 - 2-^/2)2 > (1/12), we see that 

(g.-£.)'>(l/12)g.' (4) 

for all sufficiently large i. 

The number of S'-free g-ary sequences of length r is at least g^ — (r — + 
l)|S'|g^~'^, since every word that is not S'-free must contain an element of S 
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somewhere as a subword. So the number of S'-free g-ary sequences of length 
r is at least — r|S'|g''^'^ = q^{l — r\S\q~^). Thus 

>\qT-''-\'^-'^n,\SMl'') (5) 

the last step following from ([3]). 

Now ([3D, dH) and ^ combine with ^ to show that \Ci\ > {1 / 50) {q^ ' /rii) 
for all sufficiently large i. This contradiction completes the proof of the 
theorem. □ 
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